2023, Dec, Volume 4 Issue 4

www.ijisea.org

ISSN: 2582 - 6379 Orange Publications

# Router in Bi-Network on Chip with an Advanced FIFO Structure

D Sankar Kumar<sup>1</sup>, D Dayakar<sup>2</sup>, D Hema<sup>3</sup>, B Aruna Kumari<sup>4</sup>
PG Scholar<sup>1</sup>, Professor<sup>2</sup>, Assistant Professor<sup>3,4</sup>
Department of Electronics and Communication Engineering<sup>1,2,3,4</sup>
Amrita Sai Institute of Science and Technology,
Paritala, Knachikacherla, NTR District ,Andhrapradesh ,India

# **ABSTRACT**

Network on chip (NoC) becomes a promising solution for intercommunication infrastructure in System on Chip (SoC) as traditional methods exhibit severe bottlenecks at intercommunication among processor elements. However, designing of NoC is majorly complex because of lot of issues raise in terms of performance metrics such as system scalability, latency, power consumption and signal integrity. This paper discussed issues of memory unit in router and thereafter, proposing advanced memory structure. To obtain efficient data transfer, FIFO buffers are implemented in distributed RAM and virtual channels for FPGA based NoC. An advanced FIFO based memory units are proposed in NoC router and the performance is evaluated in Bi-directional NoC (Bi-NoC). The major motivation of this paper is to reduce burden of router while improving FIFO internal structure. To enhance the speed data transfer, Bi-NoC with a self-configurable intercommunication channel is proposed. The Simulations and synthesis results are proven guaranteed throughput, predictable latency, and fair network access highly provided when compared to recent works.

Keywords: Network on Chip, system on chip, FIFO, FPGA.

# **I.INTRODUCTION**

System on chip (SOC) is a complex interconnection of various functional elements. It creates communication bottleneck in the gigabit communication due to its bus based architecture. Thus there was need of system that explicit modularity and parallelism, network on chip possess many such attractive properties and solve the problem of communication bottleneck. It basically works on the idea of interconnection of cores using on chip network.

The communication on network on chip is carried out by means of router, so for implementing better NOC, the router should be efficiently design. This router supports four parallel connections at the same time. It uses store and forward type of flow control and Fsm Controller deterministic routing which improves the performance of router. The switching mechanism used here is packet switching which is generally used on network on chip.



www.ijisea.org

ISSN: 2582 - 6379 Orange Publications

In packet switching the data the data transfers in the form of packets between cooperating routers and independent routing decision is taken. The store and forward flow mechanism is best because it does not reserve channels and thus does not lead to idle physical channels. The arbiter is of rotating priority scheme so that every channel once get chance to transfer its data. In this router both input and output buffering is used so that congestion can be avoided at both sides.

A router is a device that forwards data packets across computer networks. Routers perform the data "traffic direction" functions on the Internet. A router is a microprocessor- controlled device that is connected to two or more data lines from different networks. When a data packet comes in on one of the lines .The router reads the address information in the packet to determine its ultimate destination. Then, using information in its routing table, it directs the packet to the next network on its journey.

The router is a" Four Port Network Router" has a one input port from which the packet enters. It has three output ports where the packet is driven out. Packet contains 3 parts. They are Header, data and frame check sequence. Packet width is 8 bits and the length of the packet can be between 1 bytes to 63 bytes. Packet header contains three fields DA and length. Destination address (DA) of the packet is of 8 bits. The switch drives the packet to respective ports based on this destination address of the packets. Each output port has 8-bit unique port address. If the destination address of the packet matches the port address, then switch drives the packet to the output port, Length of the data is of 8 bits and from 0 to 63. Length is measured in terms of bytes. Data should be in terms of bytes and can take anything. Frame check sequence contains the security check of the packet. It is calculated over the header and data.

A data packet is typically passed from router to router through the networks of the Internet until it gets to its destination computer. Routers also perform other tasks such as translating the data transmission protocol of the packet to the appropriate protocol of the next network.

# **II.LITERATURE SURVEY**

The very first device that had fundamentally the same functionality as a router does today was the Interface Message Processor (IMP); IMPs were the devices that made up the ARPANET, the first packet network. The idea for a router (called "gateways" at the time) initially came about through an international group of computer networking researchers called the International Network Working Group (INWG). Set up in 1972 as an informal group to consider the technical issues involved in connecting different networks, later that year it became a subcommittee of the International Federation for Information Processing.

These devices were different from most previous packet networks in two ways. First, they connected dissimilar kinds of networks, such as serial lines and local area networks. Second, they were connectionless devices, which had no role in assuring that traffic was delivered reliably, leaving that entirely to the hosts (this particular idea had been previously pioneered in the CYCLADES network). The idea was explored in more detail, with the intention to produce a prototype system, as part of two contemporaneous programs. One was the initial DARPA-initiated program, which created the TCP/IP architecture in use today.

Sometime after early 1974 the first Xerox routers became operational. The first true IP router was developed by Virginia Strazisar at BBN, as part of that DARPA-initiated effort, during 1975-1976. By the end of 1976, three PDP-11-based routers were in service in the experimental prototype Internet. The first multiprotocol routers were independently created by staff researchers at MIT and Stanford in

1981; the Stanford router was done by William Yeager, and the MIT one by Noel Chiappa; both were also based on PDP-11s.



www.ijisea.org

ISSN: 2582 - 6379 Orange Publications

Virtually all networking now uses TCP/IP, but multiprotocol routers are still manufactured. They were important in the early stages of the growth of computer networking, when protocols other than TCP/IP were in use. Modern Internet routers that handle both IPv4 and IPv6 are multiprotocol, but are simpler devices than routers processing AppleTalk, DECnet, IP, and Xerox protocols.

From the mid-1970s and in the 1980s, general-purpose mini-computers served as routers. Modern high-speed routers are highly specialized computers with extra hardware added to speed both common routing functions, such as packet forwarding, and specialized functions such as IPSecencryption. There is substantial use of Linux and UNIX software based machines, running open source routing code, for research and other applications.

Cisco's operating system was independently designed. Major router operating systems, such as those from Juniper Networks and Extreme Networks, are extensively modified versions of UNIX software.

### **III.OVERVIEW OF NETWORK ON CHIP**

The growing computation-intensive applications and the needs of low-power, high-performance systems, the number of computing resources in single-chip has enormously increased, because current VLSI technology can support such an extensive integration of transistors. By adding many computing resources such as CPU, DSP, specific IPs, etc to build a system in System-on-Chip, its interconnection between each other becomes another challenging issue. In most System-on-Chip applications, a shared bus interconnection which needs arbitration logic to serialize several bus access requests, is adopted to communicate with each integrated processing unit because of its low-cost and simple control characteristics. However, such shared bus interconnection has some limitation in its scalability because only one master at a time can utilize the bus which means all the bus accesses should be serialized by the arbitrator. Therefore, in such an environment where the number of bus requesters is large and their required bandwidth for interconnection is more than the current bus, some other interconnection methods should be considered.

Such scalable bandwidth requirement can be satisfied by using on-chip packet-switched micro-network of interconnects, generally known as Network-on-Chip (NOC) architecture. The basic idea came from traditional large-scale multi-processors and distributed computing networks. The scalable and modular nature of NOCs and their support for efficient on-chip communication lead to NOC-based system implementations. Even though the current network technologies are well developed and their supporting features are excellent, their complicated configurations and implementation complexity make it hard to be adopted as an on-chip interconnection methodology. In order to meet typical SOCs or multi-core processing environment, basic module of network interconnection like switching logic, routing algorithm and its packet definition should be light-weighted to result in easily implemental solutions.

# **IV.PROPOSED METHODOLOGY**

Router is a packet based protocol. Router drives the incoming packet which comes from the input port to output ports based on the address contained in the packet. The router has a one input port from which the packet enters. It has three output ports where the packet is driven out. The router has an active low synchronous input resetn which resets the router.



www.ijisea.org



Fig 1: Shows Block Diagram of Four Port Router



Fig.2: Shows Typical structure of Bi-directional NoC



Fig.3:Shows Structure of Bi-NoC with virtual channel allocation

Linear feedback shift registers are used for generating PN sequences. Components of D ip ops are used for this since structural modeling is used. To generate the sequence, rst it is necessary to initialize the ip ops to a particular value. Since 15 bit long PN sequence is being used, four ip ops are required and these four ip ops are required to be initialized. For that purpose, init signals are used. After the initialization, the xor feedback logic will provide a method to generate a PN sequence. Orthogonal sequences are required in this system. Time shifted versions of a PN sequence will be nearly orthogonal. So to shift the sequences, shift registers are used in which the sequence is given as input to the registers. The outputs from intermediate ip ops are taken which will be time shifted. So at the output of PN generator four PN sequences are obtained.

ar dell

International Journal for Interdisciplinary Sciences and Engineering Applications
IJISEA - An International Peer- Reviewed Journal
2023, Dec, Volume 4 Issue 4

www.ijisea.org

ISSN: 2582 - 6379 Orange Publications

The CDMA router has M transmit/receive ports. The main difference between the overloaded and classical CDMA routers is that M > N - 1 for the formerdue to channel overloading. Each PE is connected to two network interfaces (NIs), transmit and receive NI modules.

During packet transmission from a PE, the packet is dividedinto flits to be stored in the transmit NI first-input firstoutput(FIFO). The router arbiter then selects M winningflits at most from the top of the NI FIFOs to be transmittedduring the current transaction. The selected flits must all havean exclusive destination address to prevent conflicts, and awinner from two conflicting flits is selected according to therouter's priority scheme. The employed priority scheme isthe fixed winner that takes all priority schemes; only one ofthe transmitters is given a spreading code and is acknowledgedto start encoding. Once done, the router assigns CDMA codesto each transmit and receive NI. NIs with empty FIFOs orconflicting destinations are assigned all-zero CDMA codessuch that they do not contribute MAI to the CDMA channelsum. Afterward, flits from each NI are spread by the CDMAcodes in the encoder module.

If a NoC's router has a larger FIFO buffer, the throughput will be larger and the latency in the network smaller, since it will have fewer flits stagnant on the network [20]. Nevertheless, there is a limit on the increase of the FIFO depth. Since each communication will have its peculiarities, sizing the FIFO for the worst case communication scenario will compromise not only the routing area, but power as well [6]. However, if the router has a small FIFO depth, the latency will be larger, and quality of service (QoS) can be compromised. The proposed solution is to have a heterogeneous router, in which each channel can have a different buffer size. In this situation, if a channel has a communication rate smaller than its neighbour, it may lend some of its buffer slots that are not being used. In a different communication pattern, the roles may be reversed or changed at run time, without a redesign step. The proposed architecture is able to sustain performance due to the fact that, statistically, not all buffers are used all the time. In our architecture it is possible to dynamically reconfigure different buffer depths for each channel. A channel can lend part or the whole of its buffer slots in accordance with the requirements of the neighbouring buffers. To reduce connection costs, each channel may only use the available buffer slots of its right and left neighbour channels. This way, each channel may have up to three times more buffer slots than its original buffer with the size defined at design time. Fig. 4 shows the original and proposed input FIFO. Comparing the two architectures, the new proposal uses more multiplexers to allow the reconfiguration process. Fig. 4(b) presents the South Channel as an example. In this architecture it is possible to dynamically configure different buffer depths for the channels. In accordance with this figure, each channel has five multiplexers, and two of these multiplexers are responsible to control the input and output of data. These multiplexers present a fixed size, being independent of the buffer size. Other three multiplexers are necessary to control the read and write process of the FIFO. The size of the multiplexers that control the buffer slots increases according to the depth of the buffer. These multiplexers are controlled by the FSM of the FIFO. In order to reduce routing and extra multiplexers, we adopted the strategy of changing the control part of each channel. Some rules were defined in order to enable the use of buffers from one channel by other adjacent channels. When a channel fills all its FIFO it can borrow more buffer words from its neighbours. First the channel asks for buffer words to the right neighbour, and if it still needs more buffers, it tries to borrow from the left neighbour FIFO. In this manner, some signals of each channel must be sent for the neighbouring channels in order to control its stored flits.

The direction of request transfers to the virtual channel allocation (VA) stage to select associated virtual channel of neighbour router. Because number of requests to access same virtual channel, there may be contention will occur among data flites. The flites which are accessed virtual channel should be present in the current router as previous router is blocked due to contention. Once data flite crosses the VA stage, it assigns to SA stage that presents physical channel into neighbour router. By multiplexing virtual channels to buffer, free data packets never block other data packets which are ready transferred to the destination

ar dell

International Journal for Interdisciplinary Sciences and Engineering Applications
IJISEA - An International Peer- Reviewed Journal

2023, Dec, Volume 4 Issue 4

www.ijisea.org

ISSN: 2582 - 6379 Orange Publications

trough physical channel. Fig.3 describes the data transfer with help of virtual channels when physical channel is busy or blocked in Bi-NoC. This structure enables multiple virtual channels with physical channel of associated buffer thereby increasing throughput and avoiding deadlock error. The flow of virtual channel in router from input port to output port is as shown Fig.3. The incoming flit which has high priority arrives to the neighbour router accessed appropriate virtual channel initially; thereafter entire data packet will be processed. The incoming first flit of data packet is head flit which arrives to top of virtual channel queue of the buffer thereby entering into RC stage. It decodes in RC stage and creates respective direction of request towards destination router. The direction request of flit transfers to VA stage to obtain selected virtual channels towards destination router. The contention may occur among data packets with direction request towards destination router when same virtual channel utilized.

The data packets which are not accessed virtual channel waits in VA stage and it will be start transfers data packet once current flit reached to next router thereby avoiding contention failure of data packets. By multiplexing entire virtual channels to one buffer queue, any flit cannot block other data packets which are available to route though physical channel. In typical NoC structure, routers are intercommunicated through unidirectional channel whereas in Bi-NoC, data intercommunicated in any channel thereby improving bandwidth utilization. In order to configure channels dynamically, added channel control module added to each directional channel. The proposed design uses each channel either input and output therefore the width of channel request from the RC stage is doubled.

The two bi-directional channels are requested to data transfer at each output direction thereby decreasing contention by sending data packets into same direction simultaneously. Hence, the channel control module has two functions that are dynamic configuration and maintaining the channel request. As bi-directional channel is shared with a pair of neighbour routers, the output of each transition is authorized by channel control protocol of two routers. The channel control protocol is composed by FSM module to obtain higher efficiency. The other responsibility is maintaining of channel in terms of blocked or unblocked which is depend on status of the channel. When channel is available to use, the arbiter sends the request to SA module to process the channel allocation [16]. The highlight of this structure is replacing unidirectional channel into bi-directional thereby enhancing the channel utilization and flexibility without required additional bandwidth.

# V. RESULTS AND DISCUSSION

# RTL







23, Dec, volume 4 Issue 4 www.ijisea.org

#### INTERNAL BLOCK



#### **SIMULATION RESULTS**



#### **V.CONCLUSION**

An advanced FIFO structure based NoC is simulated and synthesized in Xilinx 14.7 ISE and implemented Vertex-6 FPGA device to analyze the performance in terms of occupied area, latency, power consumption and throughput. Single router is designed initially and then designed mesh based NoC to realize the memory utilization of FPGA. Fig.4 indicates that Register Transfer Level (RTL) schematic of single NoC router which is composed with input and output ports, arbiter, crossbar and channel control modules. The figure also describes the utilizations in terms of memory units each component individually. Each module of NoC designed using Verilog Hardware Description Language (HDL) separately and integrated as one module. An advanced queued buffer is designed both typical NoC and Bi-directional NoC thereby comparing both designs easily. The simulation results are analyzed area utilization in terms of occupied number of slices registers, LUT-FF pairs and slice registers), latency in terms of delay, Maximum operating frequency, power consumption in terms of dynamic power dissipation, memory utilization in terms of number of RAMs, and finally, throughput in terms of flits per sec., node, describes the performance of NoC router in terms area, delay and power consumption which are obtained by implemented proposed in FPGA configuration. From fig.5, it clear that proposed design shows less area overhead because of queued buffer shared between neighbour routers and also data flits are used to transfer data packet between source and destination. The memory unit such as number of RAMs is also less because of active components uses buffer whereas idle modules are not using RAMs. The delay of proposed design is less



www.ijisea.org

ISSN: 2582 - 6379 Orange Publications

alternatively operating frequency is high because more number of channels (both physical and virtual) is available between source and destination. The total power consumption is slightly increased than existing work because of virtual channels are increased dynamic power consumption while data packet transfer NoC is the solution for intercommunication of SoC such as parallel communication wires and also removes barriers of bus based communication. In this paper, an advanced memory unit is proposed and implemented in Bi-NoC to achieve less memory requirement of buffer and also high performance in terms of Maximum operating bandwidth. When compared to previous work, the proposed work improved approximately 28% delay and 17% resources utilization. As RingNet[15] used Round robin arbiter, the resources utilization is more than proposed work. Data packet divided into number of flits and queued buffer is shared between neighbour routers thereby requiring of buffer size is less when data transferred through data flits from source to destination. This advanced router design integrated in Bi-NoC configuration to achieve higher data transfer speed when compared to typical NoC. Virtual channels are created between routers when data flit is block in case of physical channel is not available therefore data packet latency is reduced as well as deadlock error avoided. The implementation results are improved in terms of resource utilization when compared with existing work. In future, NoC based processors are used at Artificial Intelligence applications. The performance NoC is needed to be improved by advancing router components because the power consumption increased through virtual channels at advanced FIFO structure.

Many future work directions are inspired by this paper including exploiting the mathematical properties of the code space to find additional non orthogonal codes and boost the CDMA interconnect capacity and exploring more architecture a optimizations of the OCI crossbar. Studying the robustness of CDMA interconnects and its enhancement techniques will be one of the prior future research points. Moreover, we plan to investigate using the OCI-based routers in different network topologies, evaluate their performance using standard bench marks, and study their suitability for various applications.

# **REFERENCES:**

[1]W. J. Dally and B. Towles, "Route packets, not wires: on-chip interconnection networks," in Proceedings of the 38th Design Automation Conference, pp. 684–689, Las Vegas, Nev, USA, June 2001.

[2]Shin, E. S., Mooney III, V. J., & Riley, G. F. (2002, October). Round-robin arbiter design and generation. In Proceedings of the 15th international symposium on System Synthesis (pp. 243-248). ACM. [3]Ashok Kumar K, Dananjayan P, "A survey on Silicon on Chip Communication", Indian Journal of Science and Technology, ISSN: 0974 -5645, Vol-10, Issue-1, pp.1-10, January 2017.

[4]Raparti, V. Y., & Pasricha, S. (2019). Approximate NoC and Memory Controller Architectures for GPGPU Accelerators. IEEE Transactions on Parallel and Distributed Systems.

[5]Goebel, M., Behnke, I., Elhossini, A., & Juurlink, B. (2018, May). An Application-Specific Memory Management Unit for FPGA-SoCs. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (pp. 222-225). IEEE.

[6]Jang, H., Han, K., Lee, S., Lee, J. J., & Lee, W. (2019). MMNoC: Embedding Memory Management Units into Network-on-Chip for Lightweight Embedded Systems. IEEE Access, 7, 80011-80019.

[7]Ashok Kumar K, Dananjayan P, "Parallel Overloaded CDMA Crossbar for Network on Chip," Facta Universitatis, Series: Electronics and Energetics, Vol32, Issue-1, pp. 105-118, March 2019.



, www.ijisea.org

[8]Halawani, Y., Mohammad, B., Homouz, D., AlQutayri, M., & Saleh, H. (2013, December). Embedded memory design using memristor: Retention time versus write energy. In 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS) (pp. 41-44). IEEE

[9]Ashok Kumar K, Dananjayan P, "Improvement of Code Utilization CDMA for On-Chip Communication Architecture using Orthogonal Gold Code," 3rd International Conference on Inventive Computation Technologies, organized by RVS Technical Campus, Coimbatore, India, pp.567-571, 15&16th November, 2018.

[10]Chen, X., Lu, Z., Jantsch, A., & Chen, S. (2010, March). Supporting distributed shared memory on multi-core network-on-chips using a dual microcoded controller. In 2010 Design, Automation & Test in

[11]Chung, E. S., Hoe, J. C., & Mai, K. (2011, February). CoRAM: an in-fabric memory architecture for FPGAbased computing. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays (pp. 97-106).

[12] Su, N., Gu, H., Wang, K., Yu, X., & Zhang, B. (2018). A highly efficient dynamic router for application oriented network on chip. The Journal of Supercomputing, 74(7), 2905-2915.

[13] Jang, H., Han, K., Lee, S., Lee, J. J., & Lee, W. (2019). MMNoC: Embedding Memory Management Units into Network-on-Chip for Lightweight Embedded Systems. IEEE Access, 7, 80011-80019.

[14] Gordon-Ross, A., Abdel-Hafeez, S., & Alsafrjalni, M. H. (2019, July). A One-Cycle FIFO Buffer for Memory Management Units in Manycore Systems. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (pp. 265-270). IEEE.

[15] Siast, J., Łuczak, A., & Domański, M. (2019). RingNet: A Memory-Oriented Network-On-Chip Designed for FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(6), 1284-1297.

[16] Yusuf, B. B., Maqsood, T., Rehman, F., & Madani, S. A. (2021). Energy Aware Parallel Scheduling Techniques for Network-on-Chip Based Systems. IEEE Access, 9, 38778-38791.



www.ijisea.org